DOCUMENT RESUME 



ED 279 151 



PL 016 174 



AUTHOR 
TITLE 

PUB DATE 
NOTE 



PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



Andersson, Erik; Ostman, Jan-Ola 

Computer Processing of Swedish Syntactic bata. Some 
Preliminaries and Tentative Results. 
78 

39p. ; In: Andersson, Erik, Ed. Working Papers on 
Computer Processing of Syntactic Data. Abo, Research 
Institute of the Abo Akademi Foundation, 1978. 
P73-108. 

Reports - Descriptive (141) 
MF01/PC02 Plus Postage. 

"Clossifi cation; Comparative Analysis; "Computational 
Linguistics; "Data Processing; Deep Structure; 
English; Finnish; Form Classes (Languages); Phrase 
Structure; Research Methodology; Sentence Structure; 
Structural Analysis (Linguistics); Surface Structure; 
"Swedish; "Syntax; Uncommonly Taught Languages 

ABSTRACT 

The encoding of Swedish syntactic information in a 
study of linguistic data processing is described in detail. The 
choice of syntactic variables and classification criteria, 
illustrated with examples, is also discussed. In addition, uses of 
the coding for contrasting languages, as in this study 9 s comparison 
of Swedish, Finnish, and English word order, are explored. The coding 
key consists of 64 variables corresponding to syntactic or semantic 
properties of a clause. Each clause in the corpus is classified in a 
subcategory of one of the variables. The variables are divided into 
four groups: those identifying the clauses, those specifying some 
syntactic properties of the entire clause or its context, those 
classifying the clause constituents (verb, subject, object, 
complement, initial adverbial, adverbial in the central field, non- 
final adverbial in the end field, and final adverbial in the end 
field), and those specifying what transformations have been applied 
to the clause and what constituents they have affected. (MSE ) 



************************* ***£****************************************** 

* Reproductions supplied by EDRS are the best that can be made " 

" from the original document. " 

************** ****** *************************** ******* ****** ************ 



9 

ERLC 



COMPUTER PROCESSING OF SWEDISH SYNTACTIC DATA 



Some Preliminaries and Tentative Results 



by Erik Andersson and Jan-Ola Ostman 



U.S. DEPARTMENT OF EDUCATION 
Office of Educational Reeearch and Improvement 

EDUCATIONAL RESOURCES INFORMATION 
. CENTER (ERIC) 

V" This document hei bean reproduced ei 
f received Irom the par ton or organtiation 
originating it 

□ Minor changes hava baan mada lo improve 
reproduction Quality 



a Point! of view or opmionatteted ml hit docu- 
ment do not necessarily repraient official 
OERI position or policy 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



2 BEST COPY AVAILABLE 



-75- 

COMPUTER PROCESSING OF SWEDISH SYNTACTIC DATA 
Son* Preliminaries and Tentative Rasults 

by Erik Andorsion and Jan-Ola Ostmar, 



1. introduction 

In the academic year 1976-77, the Text linguistic! Research 
Group carried out a pilot study of a system for data processing of 
Swedish, Finnish and English grawnar. This paper gives a brief 
survey of the methodological preliminaries and some tentative results 
from this study. The sin was to sled some light on the twttual 
factors governing word order, especially In those casra where these 
factors Interact with Intrasententlal factors and several alternative 
word orders are available. The basic data weru processed by means of 
tad computer programme described elsewhere In this volume by Kohoncn 
and Salmela. 

The purpose of this paper Is to give a fairly detailed account 
of the variable* used In the encoding of the Swedish material, to 
justify and discuss the choice of syntactic variables and classlflca- 
tory criteria, and to give some examples of how the system works In 
practice. We hope that our work will be of Interest to people 
carrying out similar studies or following up the present project. 
Originally, the project had a contrtstlve purpose. By using roughly 
the ssjmi coding key for Swedish, Finnish end English sentences, we 
Intended the computer program to bring out similarities and 
differences In the word order systems of these languages. This aim 
has been postponed for the time being, and In this paper we shall 
only briefly refer to some contrastlve applications of the coding key. 
it should also be noted that the methodology presented In this paper 
will be relevant for types of syntactic research other than wcrd order 
studies, as the computer programme described In the paper by Kohoncn 
and Salmela Is applicable to the processing of ar\y linguistic data. 

The textual material used for the collection of Swedish' data 
consisted of extracts from a fairy-tale story by Astrld Mndgren, 
Brgderna Lejonhjlrta (pp. 201-207, 100 sentences), and from a young 



3 



people's novel by Bo Carpel an, Ploon (pp. 6-14, 163 sentences). 
Since wo were Interested 1n how context affects the word order of 
Individual clauses and how contextual factors Interact with Internal 
properties of clauses* we chose the clause as our basic coding unit, 

1. e. each punch card fed Into the computer contained one clause. The 
total numUr of clauses In the pilot study was about 600. The stall 
amount of data itself Indicate* the tentative nature of this atuty, 
and the results to be presented later In the paper should be taken 
—at the most—as Indications of results whkh could be found with the 
use of a larger corpus material. Me are also aware of the fact that 
the results are representative only of a rpeclal genre. However, we 
think that It 1s advisable to work out a coding system and test It out 
on quite simple and regular texts. 

It was necessary to revise the coding plan a number or times 
during encoding, and consequently, our data had to be recoded several 
times. This 1s another reason why 1t Is expedient to work on a 
relatively small pilot corpus. In section 2, we shall present a 
coding key, which 1s revised on the basis of the results from the 
pilot st»4y, ThU section Is rather detailed for two reasons. First, 
we hope that it could serve as a manual for students encoding further 
material, and secondly, 1t may give useful Information on the kind of 
difficulties the encoder encounters. In section 3, we give some 
examples of the results that can be obtained from a study of this 
kind. However, these results rt'er to a previous stage of the 
analysis* It was partly on the tuls of these results that we have 
made some changes 1n the coding key* These changes will be Indicated 
In section 2. The revised coding key Is presented as an appendix. 

2, Towards a coding key for computer processing of Swedish 

As we already mentioned, each clause constitutes a record In 
our coding system. The coding kty consists of 64 variables, corres- 
ponding to syntactic or semantic properties of a clause. For each 
variable, a number of subcategories are given, and each clause Is 
classified Into one of these subcategories. For Instance, variable 
(52) Type of Topical Ited constituent has the following subcategories: 



4 " 



75* 



I Object, 2 Object ai quotation, 3 Adverbial normally placed In the 
central field, 4 Adverbial normally pieced In the end field* etc. If 
• variable Is not applicable to a clause— 1n this example, If no 
topical liatlon has applied— the symbol 0 (or blank) If used. 

The variables can bt divided into four groups. Variables 1 - 6 
are used to Identify the clauses. Variables 7-16 an* 50 - 51 are 
used to specify tone syntactic properties of the entire clause or Its 
context. Variable! 16 • 49 are used to classify the constituents of 
the clause, I.e. the verb, the sutject, the object, the complement, 
an Initial adverbial, an adverbial In the central field, a non-final 
adverbial 1n the end field, a final adverbial In the end field. 1 
Some of these constituents may be Hissing In a clause, but a clause 
My also contain constituents not covered by the coding key, e»g. 
I second adverbial In the central field, or a third one In the end 
field, However, the clauses In our material proved to contain very few 
'extra" constituents of this kind, and It therefore did not seem 
nocessary to Include additional variables for other types of 
constituents thin those mentioned above. Indirect objects might be 
an exception, tf mo want to study when an Indirect object Is used 
Instead of a dative adverbial. The fourth group of variables Is 
related to the second groups variables 52 • 64 specify what trans- 
formations have been applied to the clause and what constituents 
they have affected, 

2.1 Identification of clauses 

(11 Lanfluage of Text . This variable Is needed only If text* In 
different languages are analyzed. It will be necessary In a 
comparative study of translations: If a clause Is given the sa«m 
clause number as Its translation (cf. bolcw), a computer programme 
an be constructed which will compare the desired properties of the 
clause and Its translation equivalent (cf. Kohonen and Salmela). 

(2) Identification Number of Text . This variable Is especially 
woeful for stylistic Investigations, since we can compare the 
frequency of a particular phenomenon In one text with that In other 



Hr t*e definition of central and erJ fields, see the discussion of 
—self olt and InholdsfeVt In Dlderlchsen 1946. 



5 



I Object, 2 Object ai quotation, 3 Adverbial normally placed In the 
central field, 4 Adverbial normally placed In the end field* etc. If 
* variable Is not applicable to a clause—In this oxsnplo, If no 
topical liatlon has applied— the symbol 0 (or blank) Is used. 

The variables can bt divided Into four groups. Variables 1 - 6 
are used to Identify the clauses. Virlables 7-16 and 50 - 51 are 
used to specify tone syntactic properties of the entire clause or Its 
context. Variables 16 • 49 are used to classify the constituents of 
the clause, I.e. the verb, the subject, the object, the complement, 
an Initial adverbial, an adverbial In the central field, a non-final 
adverbial 1n the end field, a final adverbial In the end field. 1 
Some of these constituents way be Hissing In a clause, but a clause 
My also contain constituents not covered by the coding key, e,g, 
I second adverbial In the central field, or a third one In the end 
field. However, the clauses In our material proved to contain very few 
'extra" constituents of this kind, and It therefore did not seem 
necessary to Include additional variables for other types of 
constituents thun those mentioned above. Indirect objects might be 
an exception, tf mo want to study when an Indirect object Is used 
Instead of a dative adverbial. The fourth group of virlaMes Is 
related to the second groups variables 52 • 64 specify what trans- 
formations have been applied to the clause and what constituents 
they have affected, 

2.1 Identification of clauses 

(1) language of Text . This variable Is needed only If texts In 
different languages art analyzed. It will be necessary In a 
ctwparatlve study of translations: If a clause Is given the sa<m 
clause number as Its translation (cf. belcw), a coseuter program 
an be constructed which will compare the desired properties of the 
clause and Its translation equivalent (cf. Kohonen and Salmela). 

(2) Identification Number Of Text . This variable Is especially 
useful for stylistic Investigations, since we can covert the 
frequency of a particular phenomenon In one text with that In other 



ftr Hit, definition of central and erJ fields, set the discussion of 
gejjrtfelt and Inholdsfelt In Dlderlchsen 1946. 



n 



One reason was that predicate phrases contain \te kind of functional 
element* classified by our coding key (objtct, coe*plement, adverbial), 
another was that it 1i often Impossible to distinguish coordinated 
clauses from coordinated predicate phrases In Finnish, where the 
tubjtct can of ton bo deleted. However, coordinate Infinitival were 
not regarded as separate clauses, and hero wo had to make a dtclilon 
always to clanlfy the functional elements of the Utter Infinitive 
later on In the coding. 

2.2 Clauial variable! 

(7) Humber of Clauses <n Cntlrt Sentence . Thil variable It 
relevant especially In stylistic Investigations, ilnct the complexity 
of i*it*ncet-*1n terns of number of clauses f*r ient*nce»*variei 
between texts. But 1t 1i alio ponlble that tM* variable My affect 
the fnntr structure of the constituent clauses— complex sentences 
night favour certain transfomatfons and word order tendencies* etc* 

(S) HatrU Address of Dependent Clause . This varlablt specifies 
tho clause which Is hierarchically above the ono bolng coded. 
Variables (3-5) specify tU sentence to which both these clauses 
belong. Variable (0) was added to the coding key after the preliminary 
analyses, and the present computer programme cannot Mke Much use of 

vartaMf , But whan the program has the necessary module 
(cf. fcohonon I Selmele), 1t can be used to cooper* codings on two 
different cards* i.e., 1n comparisons between a dependent clause and 
Us matrix clause* 

(9) SUtut of Clause In Sentence * All clauses were classified 
as either main clauses or dependent clauses* The dependent clause we* 
here define* as a clause functioning as i mod If. 4 r 1n a matrix clause. 
No attention was paid to the Inner structure of the clause, e.g. the 
difference between main clause and dependent clause word order. The 
main clauses were subdivided Into non*cocord1nated and coordinated 
clauses. The latter were further classified according to their 
pc tlon in the coordination, whether initial, medial, or final. In 
the provisional coding key. non-coordinated clauses and initial 
coordinated clauses formed a single category. 

The dependent clauses were classified as occurring before. 



7 



Inside or «fUr the rtit of their mtrl* clause. If two clause* 
were standing Uil In their nstrl* clause, they were both classified 
•I final* do. • clause li regarded a* mirU oedlal ©fly If II li 
surrounded by non*c1ausel constltuants. The ^«tat cleuiei in **4* 
position wo further divided Into two subcategories i sentence-final, 
*nd not sentence-f Inol clauses, 

(10) Cloosol Construction . All clouits were Classified according 
to their Inner structure as slnplo (containing no dependent clauiei) 
or ceapleft (containing dependent deuset). furthew»re, tho clausal 
w*re classified at non-coordlnetod end coordinate, and, If co- 
ordinated, ai Initial, nedlal, or final In the coordination. To a 
cortaln extent this repeats the fnfomet on already given by variable 
(I). Tho difference If* however* that these figures ,onctm all the 
douses In the «*t*rUU while variable (9) claiilfiod a*ln clauses 
only ai to coordination. It won U than h*ve boon possible to roeeve 
tho coordination clesslflcatltn froo variable (g) without losing too 
«uch Information, but it wlgM bo convenient to have direct acetyl 

to tho figure* for naln clauses only, without having to flltor out 
tho dependent ones, 

(11) Clauit Typo . A bailc distinction was node between main 
clauses and dopondont clauioi, with a oatn dauio now botng daflnod 
at a clause displaying soln claw word order and a dependent clause 
as a dauso displaying dopondant dauae word order. This holds if 
tho distinction In word ordor It observable^ otherwise typo* 
graphical crltarta or tho embedding rrlurlon (cf. variable (1)) was 
usod hort, too. Main clauses waro furthar tubdlvlded Into a*1n dautas 
tntroducod by a coordinator (och a and k , son •but 1 , fgr •since', otc.), 
owbedded soln clautts ind othar oaln dausos. Originally, tho last 
catogory includod only embedded dlroct quotations, but wo suggest 
that othar embedded dausos with m1n dauso word ordor should bo 
Includod, too, such as tho following case: 

Sor'du on U*1 » li ta dan. 

'If you sao a cab, take IU* 
Tho dopondont dausos woro subdivided Into tntorrogatlvo dausos, 
att-clausos without proposition, ett-cleuses preceded by a pro* 
position, othar advortUl dausos, relative clauses of throe different 
typos (those sod trying tho subject, those Modifying the object, and 



8 



m4\W*9 oO*r c*AitH*#nti), coaparotirt cUuitt and eUvtal 

contrition* (M, objoa *1ti« tnftnttlti), OftflMM/, Int#rr*#«t1*i 

clatftt did not fom • mm rat* c*t#f*ry *m o*rt *#i only put 

calory of rtlatfvt claviot, 

Ati*cl*tfftM **art Hi |'«*t v ) Ml l*#n dtlttod wtrt tUM 

claifttHtd ai iU^lN #i« All 4m*tm\ cUwiti Introduced I)/ » 

**it1on kPffff fffaitfod M Indirect (nttrrttatt** dliiftii 

Jap li*«t uta «« h4 nine** tp«U1*» 
KandladO ON (0?,Mil) 

•J covld wi toll *Mt U* film rtally a****' 

Jap Undo Nr Jm flN frm (OfJHt J) 

•I folt how t fit* fomard' 
TlMil lit ootrl* clauit noed ml conUIn an InWrmattffi vtrbt 
not tw* • w»ii*U«Ml, frw rtlatlv* ctauiti of tf* f«! tewing 
typo, tomnr, wart treated at relative claweit 

Jag daniar var Jtf vlll 

•I dance wham I t**\l to 1 
(H) itorfrco Haoa of Clawto . TM* variable claielfled claum 
Into afflmattve, Interrogative, toperetive end e»c1a*»tery claviei, 
StMcUy femal criteria* tuch «i 0* utt of a QueftUoa werd or 
tubjact-verb invert ten in omtim, tJio UmoIom {operative fom In 
teperetim, tU. nere decisive, taclewtftry clmn fom o hetero- 
pwowt category Including greeting* and tMrttglvinfi* The mm*\ 
win Kayo to bo diversified on this point, toe cog Id iHo suggest « 
classification of clauses according to factional eood, I.e. 
according to tponh acta (cf. Hatullnen I Kerlsten, fortteoolog). 
In Diet cose, a nw variable would Have to bo Introduced. 

0>) Claoto Structure . The clauses were subclestlfted Into 
active twsltive and active Intransitive claum, passive claum 
wit* ao overt subject, cUwm containing o femsl subject, and 
fregoeotary claum, Clauses containing a femal subjact war* further 
subdivided Into existential claum (with an existential subject In 
objoct pom ton), subjection passives of the t/pe Pot dwedes 
•There «ii torn dancing 1 , and other claum with a fomal subject, 
0.0. Pat rogar Mt It ralnlno 1 . Uaaplot Itfco Pot tfilddH oagoo 
'Mfopons «oro forotd 1 mr* clatilflod ai OAtiUftttal claustt, 1<U 



Tbo flouw In paronthom afttr Hit ouopltt rtfor to tait rmt*r, 
Mfttotico m*t*r, and clam miobor, In that ordtr. 

( 



U* f ln»i mg| I HflE3 , '**f* # * *** ***** I* t*t ftftit', 

#10^fH II tl m totally clWr that *#f*n tbowld t* rtfiffed at 
• NkjKt •* H t*»\i **** bt **alyi*l H W Objttt, III that UN U* 

davit w**ld bo fMln^ a* o pit ttt conttratlo* vim § fomal 
whjtct only. 

faittwt confuting iffJi ♦ part lct»1t won c#M it *€ttrt 
iPti^H. Kmtr, thti dtetilo* li *m to %MHllf«, 

0<) jgfjg jjgl if CU*»4 . I* Iht pmlilNl coding tty 
tttr W dlfftftftt wrfatt word ordtr ottttret *tf« dfiUnf*UH*d* t*l 
co*ii**nt1y Itii frtjvoot tat*f»ri*t mm t»#*ttlft*l by *o ooro 
than g coup It tf clivm. If tht ttatttttct art to bt *i*nfngM, 
tht Hit of UM cuffirt §K«mig bt ironically enlirftd* to that • fair 
m***r of cliutti will fill MMtr mil factory* f*t* **• II Mil ho 
MM to Ma m» wty cat^forltit to Mr# thtftftro nMta#4 tho 
w*btr of ivrfact pattern* by itodylng only UN throo **tn 
conitHMtnt* of tf* ttntoncai Iht ttfhjtct, tht vtrb, iN out 
cooplowtnt | In i wtdt ttntt), Tht attomatlm ivlll than bt th* tt* 
pofttblt ptnMUllOAl of DNttt thrto c^UI^U, aM alio IM 
patttrnt W, W» ond Vt Hawmr f IN rt art mil wuty tanolrtd 
prtb)o«t In Witt tyttM, If thtrt art itvoral coapl«w*U, wMch 0*0 
thoyld bo chotoo oi U* uili of tht clattlfltation? Ii V t*t flnifct 

or t*o miis vtrb? 

tlltptfcol cooiimiiOM alio po*t a probln** in Uili pilot 
itvdy tht utttlng tlcwtntt wt Indicate *ai If thty wart thtr«\ 
In Uit poittton nhtro thty would bo pUtH mtt conmltntly. 

In cUuwi with o fomal ard an tftlttootfat tuhjtct tht 
tatttonttal twbjtct mi rtftr d td at S, and tho formal lUbjKl «ni 
tgnorN. In claum locking a ml SMbjKl Ibo fonml tvbjKi *a* 
trntod at $ # but II could bo Iperti hart, too* 

Cltft conttrvctlont mn claitlfIN occoNlftf to Ua ^ord ordtr 
of Hit Mfbtr untonct In U* conttmc|1on« €.«» Ool »ar jofcan ton Jai 
0| 'It not John who I, tow 9 wat codod at SVI. Howovtr, tha coding 
mid ha *t boon wort cootlttont If tht tontonco had bttn codod ai 
XSt, btctuta tht M&htr aontonco D»t tar #ii wai dhrtfa r4td Whon 
clam mnbors (tarlablo (I)) mtra oiilfnod. Alio* t* art InUmttd 

bora In tht thoaatlc ordtrlog of tho mIa conatlti*anU of tho clawMi 
not tht ordtr Ing of audi fowl olowtntt 01 jot and tar, *Mth li 



10 



41* 



tfl* tf €#*ftl*t**t rwKm 0* I ***Ut*M I* l*t fM*c*l<*t **rUIU. 
l **! cliittfitJ #i ** dfcjKt* * \m i*m \m*+\* tf ♦* ttrtffctil, TM 

V«rUllt *#i *t*M lt*ff It UNI Ctttftf My, 

T*t ^mH<N» ttit c#M*1*N * ctatlt tf wmfete* 

KftNttf It lm tf ttfftlttti t#f**| **f*ttt* fe* 0##*% tf CPtmliW 

MM in** 'ttfctir't •f *y ***** tf *tf*t'*« ;*u*t Mm^iii tit* 
*3M tta *Nr*1jf\ Till i ttriofclt *l|*i ***** to pr#*m#* If tto 
i1t 1i It ctttrtil >*4\%* uM ritMi*. Atottof mltfelt it IN 
frtvtitt**! ctiltf My t*41cM#4 On tctwrtuff tf Itttrjtctttft* 
toft**, tft*e« tf I* tf # cu«v#, Alt***!* u U f*f*lfcle 

itet #* Irttr jtttit* tr t m-itirt i^mifKi tiyii tfifftr i iof i» 
ultittlt* tr iffHt mrt trfer t* **» otMf oty, UKfi mUfclt Mi 
feww ml frtt U*t c**f*i M/ t* Hi pttttt font. 

f>J c*«itin**t ml**!** 
1,3.1 TWf vtrt 

(II) f»r» Stmi^rt .HiU nrtttlt *mrltoi *Ml It UNUItMl 
fmmf Mi tott ctllH tto wNttttt wtf. Alt*t¥|* U* vtr** tf 
i ifMttct m*4 tot tlttyt ftm t ct*UI!ttt*t« tt tow* | f t #N Una 
tttttMf ftf tmtkol mttftl* Tto wrltfcM Ml fttf 4\ft*r**t 
Jlttotttltit t) ituplt n. totpltt, l.t. ftm CMUMItf • ttAil 
Hit tf *i*U*f tvittliry mfc, t.f. *U«tf • l^&ilS* fei£ liflttl * 
fc) *W*lt n. ftfltnlw, t.f* * Sajlfi c) iltplt **. 
i«t. wto cmfiiltf of imtil ««fti t t*t* «2£ * «li tiit i 4) ittplt 
ttr% in* wt ctott, t»#. • ot t t t t ct tf imrtl Milt ttH» f t*f . 
ttfltw * »*rJ» ttrly > MKm Umi fttf tlcMtotlti Oft ct»6\9*4> 
t tytltt c**ui*t*t ^ » II t^beottttflti li ttUltN. 

TM t^rtMl mto tytlcoll/ ctoUlt o mtol NfUclt. Hottttf, 
m Mirt olit ItcltM Idlomtlc ctorftMiltti tf vtftt it uMi 
coitttiy, t»ft 

Nit fia od» fttl. »M «o!Mtf *«4 m\U4. 9 

Mat mt tcH ilftv. 'Mo ut wHtltf 



ii 



ft****r f tl*r?*#. n« 4lt***itfft ii t* !#**! c*«itf*cttf«i #* 
ffcrttt it <*t** ** f*»ilitoi* # ******** f t***t, 

it**r **** t* fctunrtAt #frnjh*r &$r*tit**t tkmU t# 

I^W •**§»! *r ii #f * *h| 4* 

lm fcrt**r *»e<**)*§ tlntity i«f t*t wfe **** <M*i i* i*» 

*tf*rti#ft tf I** wt>, t*i*f w*§ MM 'tot*, ftf%i #f 

4*#a\A«$ ffafity* *f ft* ******* m i§ **it *** 
mi #f toft timiftotft* ** ******* to #ff*tt*ri titi*f * 
Umm t*i# it**)* iut*i* in^nnr iut#i, ttmriutttt 
femlw mnt 9 Mi iwury mts iff. Mimwi Iff! J, Si«t* 
t*t €#*J*f *rm* *1t* cAiw, «* fem #c!*N I* »*tU*i» t*e 

*ti*t tf tie** hHiMw, iVf^r i'tifti ***** t» *»f 
M.l flu ***J*ct 

t*j*t €Mt4if4l1«9 U4#T*J*|!# Clmi) Ufejttli tftfUUiHt 

t*J*ct| i*J*et f»U**l ty * ftiiritr (mi cH«m9)i i^mt 
*m*4M fc? • »rtttl»t*t tfriiftfrf » rm4M M tUN^f 

i»#if i*n mm^n h*|ki 9 fMtiiiy *t * Mfiftt* muifj 
IWmI m4)#cU *M 4ilti#4 («n4niijr $im m 

« M*ord(mit (lww 4 if * %**i«t ^Utnlnf § pnm4m*r r*xt#r 
tM« it ** fMUlnt«4 « ^riiw»irtiNr # #U« In uk« mlNi o^iof 
tJNt iiteHf i ri fi «rt fm«H«4 fit felt ftttr*n*tf*t *r^r* 



12 



When a clause contains both a formal subject and an existential 
subject, this variable, like the other subject variables, classifies 
the existential subject; The subcategory formal subject will then be 
applied only to clauses containing no other subject than a formal 
subject, as Pet feflrtar Mt Is raining'. Figures for all clauses con- 
taining a formal subject will anyway to some extent be obtained from 
variable (13). 

( 1 81 • Subject ' Fo rm, Thi s variable Is mainly concerned with the 
deffniteness .marking of the subject. Indefinite subjects are those with 
the indefinite article, with (or consisting of) an Indefinite quantifier, 
and without either article or quantifier. Definite subjects are those 
wtth the definite article, with (or consisting of) a definite quantifier, 
with a genitive attribute, or proper names (Including words like mamma 
'imimray' and pappa 'dadty'). or definite pronouns. Clauses and Infinitives 
etc, fall Into the category of "Irrelevant*, Examples of Indefinite 
quantifiers are nagra ; fler , irtgeh(t1n^) , vem S6m heist , and the numerals, 
and examples of deftnlte quantifiers are alia and de fiesta . Sometimes 
the context will give hints as to the choice between these categories. 

Coordinated subjects are problematic, since our policy was to let 
the first part of the coordination determine the classification, while 
the other parts are Ignored. 

(19} Givenness Of Subject , and (20j Type Of Conference 6f Subject , 
These two variables are discussed In section 2.5, since the criteria are 
the same for all nominal constituents In the sentence. Variable (21) 
Humber ' Of : Words ' 1 n the Subject Phrase . shows the length of the subject, 

2,3,3 The object 

(22) * Object 'Structure , The structure of the object Is comparable 
to that of the subject. An example of a formal object is det in 
Jag har ' det bra T.am having a good time 1 ,. The category 'Gapped 
subject 1 ts parallelled by the category "Deleted object' , which covers 
both ellipsis under identity and sentences with an Implied object 
such as Han It .' He was eating 1 ,/ 

(23) Object Form , The same as for the subject, (24) Gt vermes s 

of Object, (25) Type of Conference of Object , (26) Number Of Words 
trt' Object Phrase ; Indirect objects cannot be treated properly by thi$ 



13 Hi 



coding key. One alternative would be to encode them as complements, 
another to encode them as adverbial s, since they can be replaced by 
dative adverblals. Cf. section 2.4. 

2.3.4 The complement (Sw. ftredlkatlv ) . 

(271 Complement Structure . Nominal complements were classified 
Into roughly the same categories as subjects (the categories 'Formal 
subject 3 and "Gapped subject 1 have no correspondents). Adjectival 
complements were classified as coordinations* adjectives modified by 
a clause, e.g. Han var sft lat att . . . 'He was so lazy that . . 
adjectives with other modifiers, e.g. mycket klok 'very wise', 
and simple adjectives. 

The provisional coding plan Included a variable describing the 
form of the complement (cf. the subject and the object), but this 
variable was regarded as superfluous for the present purpose. 

(28} Compl ement : Type . Complements were classified as optional 
or obligatory, and as specifying the subject or the object. 
(29) Number of Words In Compleme nt. 

2.3.5 The adverblals 

In each clause, the coding key can handle a maximum of four 
adverblals. That Is, we have identified four positions for adverbials 
In the clause and Ignore adverblals In any other position. The 
existence of adverblals In other positions will generally be Indicated 
under the variable measuring the total number of adverblals In the 
clause. — The four types of adverblals are the following: 

A. Initial adverblals, which according to Dlderlchsen (1946) are 
placed In the fundament field, I.e. toplcallzed adverblals, potentially 
preceded only by coordinating or subordinating conjunctions and 
dislocated elements. 

B. Central field adverblals, which In subordinate clauses precede the 
finite verb, and in main clauses follow the finite verb and the subject 
but precede the Infinite verb and/or the object. 

C ~ D. End~fteld adverbials, which follow even Infinite verb forms 
and objects. The coding plan can take care of two adverblals of this 



14 a 



-85- 



category: the last but one and the last adverbial. The latter can be 
followed only by dislocated elements. 

If a main clause contains no Infinite verb or object, central 
field adverblals cannot be distinguished from end field adverblals. 
In such cases, a test based on the Intuitions of the coder has been 
carried out: the clause has been changed Into a subordinated clause, 
or an infinite verb form has been added. If these tests do not 
favour either position, the adverbial has besn entered as an end- 
field adverbial. 

It should also be noted that two adverbials can be joined to 
form a compound adverbial. They are then classified as one adverbial; 
the criterion ts whether they can be toplcallzed together. — The 
Variables given belrw are the same for all four types of adverblals. 

(30, 35, 40, 45) Adverbial Structure . The following classi- 
fications were used: one -word adverblals; phrases not Introduced 
by a preposition; phrases Introduced by a preposition; Infinitives 
(possibly preceded by a preposition); clausal contractions of some 
other type; full clauses, possibly preceded by a preposition; noun 
phrases or prepositional phrases containing a clause. Articles were 
not counted as words. 

The clausal contractions are mainly reduced comparative clauses 
Hke: H*n grlter per Sti Kalle 'He weeps more than Kalle". The 
following are examples of an adverbial Infinitive without a 
preposition* Och en ehda liten flaftna rilftker'att'fgrlama eller 
dflda.-. ; . "And one single little flame ... Is enough to paralyse 
or to ktll . . »' (01,51:1) Han har gett oss en rtur att tskydda oss med 
■He has given us a wall to protect ourselves with' (01,54:2). 

(31, 36, 41, 46) Adverbial Type. Negations, coranentatory 
adverbials, like Ocksa 'also' and bara 'only' formed special sub- 
categories. Special clausal adverblals are often distinguished by 
their ability to be combined with a noun phrase In the clause, as In 
Ocksa Kalle Var r daY *Kalle also was there". Such combinations are of 
course not treated under these variables/which classify elements In 
adverbial positions (cf. variable (50)). Another subcategory under 
this variable Is "frame adverbials" ,. I.e. place and time adverblals 
which present a setting for the event of the clause. Other adverblals 
are classified as optional adverblals; and obligatory ones (mainly 



-86- 



valency adverbial s). 

(32, 37, 42, 47) SeiMftt1c* of the Adverbial . This variable 
classifies the adverblals according to the familiar categories of 
tine, place, manner, degree, measure, purpose, goal, condition, cause. 
Instrumental, and agent (In passive constructions). A special category 
called "Irrelevant" Is used for other types, such as negations and 
special clausal adverblals, which had already been coded under the 
preceding variable. 

(50) Special Clausal Adverbial 1n$1de a Functional Element . 
The classification distinguishes between adverblals In the 1 subject, 
object, complement, and another adverbial. The variable Is* perhaps 
not essential to the coding key. 

(51) Number of Adverblals In the Clause . Only adverblals 
modifying the predicate verb of the clause are counted, not adverblals 
Inside a functional element In the clause. 

2.4 Textual variables 

(19, 24, 33, 38, 43, 48) Gtvenness , The classification Is the 
same for subjects, objects, and adverblals, although the category 
'Irrelevant 1 will normally be used for adverblals not consisting of a 
noun phrase or a prepositional phrase. 

This variable Indicates to what extent the REFERENT of a noun 
phrase has been previously given or Introduced Into the discourse 
universe of the text. The basic classification Is between noun phrases 
referring to a new referent and those referring to an olrf referent. 
The category 'given 1 is further subdivided Into several classes. The 
referent can be textual ly given or pragmatically given, I.e. given In 
the s* * * itvation or through the common knowledge of the 
particv * In the speech situation. Pragmatically given elements 
formed a single category In the coding key, but were of three kinds: 
performatlvely known noun phrases, referring to some of the 
participants In the speech situation, generic noun phrase*, usually 
referring to the whole of a group of Individuals, and noun phrases 
referring to a generally known referent, e.g. solen , 'the sun*. 
Textually given elements were classified as mentioned In the 
preceding sentence (or even In the same sentence), or as previously 

i 

16 ... 



-87 



Mentioned. A third category night be useful, 'recently mentioned 1 , 
and could cover referents mentioned In the same passage, or referents 
which have not disappeared from the scene since they were mentioned. 

Indexlcally given referents formed an Intermediate category 
between textually and pragmatically giver noun phrases. These referents 
normally occur together with a referent which Is mentioned before 
(e.g. driving - the steering wheel) or form a part of an earlier 
mentioned referent, ' (car * tt eerlng wheel ), or belong to a group of 
elements which 1s given as a wiole, but does not contain any previously 
distinguished elements (flowers - a tultp j boys - one of the boys ). The 
latter category should probably form a subcategory of Its own, since 
It Is related to the category of new referents. 

The glvenness of clauses and Infinitives was generally regarded 
as "Irrelevant", since Judgements otherwise would have been extremely hard. 

In coordinations of NPs the functional element as a whole Is Indi- 
cated as given 1f ONE of the coordinated elements Is given. This also 
applies, mutatis mutandis , to the variables accounting for coreference of NPs. 

(20, 25, 34, 44, 49) Type of Cc-referetice of the Noun Phrase . 
This variable Indicates In what way an element Is textually bound to 
the rest of the text. It mainly indicates the type of relation 
between a given referent and Its correlate. However, the variable or 
a corresponding one could also be used to Indicate similar relations 
between elements inside the noun phrase and the context. That Is, one 
variable indicates the relationship between the referent of the noun 
phrase and the context, and another one indicates the relation between 
the noun phrase expression and the context. 

The subcategories indicate the type of linking used: by means of 
a pronoun (anaphoric, cataphoric or exophorlc), or by means of the 
repetition of a lexical noun, or by a paraphrase. The category 
"Irrelevant* includes clauses, infinitives, and textually free 
elements, but the latter category could also constitute a subcategory 
of its own. 

2.5 Transformations 

The aim of the project was to study the motivations for the use 
of a particular word order. Our hypothealswas that a transformation Is 



17 



-88- 



often applied to a clause 1n order to make the syntactically "basic" 
word order fit Into the context better. It was thus natural to work 
within the theory of transformational graanar, and to classify 
clauses according to what transformations they have undergone. We 
have consequently been interested 1n optional transformations , mainly 
movement transformations and deletions. Our goal has been to reduce 
the optional ity of thes* transformations by studying the factors which 
trigger them off* However, we have concentrated on transformations 
quite close to the surface, which makes 1t possible to avoid taking 
a standpoint 1n the controversy between Interpretive and Generative 
Semantics. 

2. 5.1 Movement transformations 

Type of Toplcallted Constituent . Topical Izatlon Is defined 
as the movement of a constituent to Initial position without Insertion 
of a pause or a coma between the moved constituent and the rest of 
the clause. The topical Ized constituent can therefore be preceded by 
coordfnatlng or subordinating conjunctions and by dislocated 
elements. Subjects were not generally Indicated as toplcallzed, but 
a special category of subject-Initial clauses should probably be 
Introduced under this variable. Toplcallzed objects are of two kinds: 
quotations and others. The motivations for the movement 1s probably 
different 1n these two cases. Adverblals are classified as central 
field adverblals and as end field adverblals, according to the coder's 
intuition where the adverbial most naturally could be placed 1f It 
had not been toplcallzed. Toplcallzed complements and verb phrases are 
less controversial subcategories. An example of a further sub- 
category, 'toplcallzed subject 1n an existential sentence 1 Is the 
following* Sldana flnns det hKr 'There are ones like that here 1 . 

The same variable can accommodate other transformations, too. If 
desired. The condition 1s of course that the transformation Is 
complementary to the topical Izatlon transformation. Yes-no questions, 
where the finite verb 1s moved to initial position, can therefore be 
recorded under a special subcategory. The transformation movement of 
a relative pronoun 1s also conplementary to toplcallzatlon, and could 
even use the same subcategories. The filtering device of the computer 



18 



-89- 



prograame could be used to distinguish between topical Uatl on and 
relative pronoun movement: If the computer operates on a subcorpus of 
main clauses only, it will Indicate topical Nations. This requires that 
the category main clause be defined on the basis of word order rather 
than on the basis of embedding, A similar solution 1s not possible for 
question-word Movement, since this transformation applies both in aaln and 
1n subordinate clauses. This transformation therefore requires sub- 
categories of its own. / 

(53) Subject Poitponed td tl>e End^Fleld . A normal subject-verb 
Inversion 1s not recorded under this variable. What 1s required 1s 
that the subject be roved to the end field. I.e. some constituent has 
to Intervene between the finite verb and the subject, preferably not 
a short central field adverbial. The subcategories are moved subject 
clause, moved subject infinitive, HP subject in existential clause 
(containing the fonaal subject det), and other moved NP subject. 

It would also be Interesting to know when extraposition of the 
subject has not taken place, although possible. Special categories 
could therefore be Introduced for non-cxtraposed subject clauses and 
subject Infinitives. Similarly, sentences In which the existential 
construction 1s allowed by the verb and the deflnlteness of the 
subject, but not chosen by the author, could be recorded under this 
variable-. 

(54) Extf apOit tf cn and Quantifier Wovement . This variable 
accounts for extrapositions of par'.u of the main functional elements 
1n the clause. The extra posed element 1s normally an att-clause or a 
relative clause: 

Den omstandlgheten ftfrvtnade srig att Pelle hade varlt henna. 
'That circumstance astonished me that Pelle had been at home 1 

Jag s&g en man 1g*r som hade rtftt Mr. 

Tsaw a man yesterday who had red hair.' . 
The subcategories art extraposition from the subject; the object; the 
complement; and from an adverbial* 

Quantifier Movement forms a separate category, e.g. Pojkarna var 
alia da> 'The boys were all there 1 :. a quantifier 1s moved from a noun 
phrase Into adverbial position. A further m\ nor category 1s toplcall- 
zatlon of part of the object: B1Ur hade vl fyra 'Cars, we had four of 1 , 
where the quantifier 1s left In the normal object position. 



19 



•90- 



Of course* the transformations 'extraposition of a modifying 
clause* and 'quantifier movement' may both apply to the same sentence. 
If so» one of the transformations will be left unrecorded. However* 
these transformations sees to be so Infrequent that such cases will be 
extremely rare and can therefore be excluded from computer processing, 
to be dealt with manually. Otherwise the variable will have to be 
split Into two. 

A special problem Is created by constructions of the type V] Adv 
och Tfc , where Vl ochVg Is analyzed as a phrasal verb. If the adverb 
Intuitively belongs to both verbs* Yj> can be analyzed as an extra,* sed 
verb. Here an alternative analysis has been applied: the adverb Is 
regarded as having been moved Into the phrasal verb, or rather* the 
first verb 1n the phrasil verb has been moved In front of the adverbial. 
This 1s an automatic transformation in main clauses (finite verb move- 
ment}. It has not been taken Into the coding key as a special trans- 
formation. 

It 1s not aTtfiys necessary tor the extreposed element to be 
separated from Its wtfctr constituent by some other element. The 
separating r.lerant can also be another modifier of the same constituent* 
which Is normally rrtaced after the extreposed one. Ir. the following 
txwpU ocksJ 1s thu swri»t1ng el ement i Pet ' f inns en vl k ocksl med 
n*k busk fa* una r (02*3:!) •Tbeie was a bay, too* with low, bush- 
like p1n(f*treet , .. 

An example of a* extraposition of a functional element would be: Jag 
skulle hajjet kvar. alltld . •! would have 1t 1n me, forever 1 (02,140:2). 

(55J Type of Cleft Construction . Since Swedish cleft constructions 
correspond to Finnish word order permutations In a single clause* It 
was natural to regard the cleft construction as consisting of one 
clause rather than two. Support for this analysis can also be found 1n 
th? fact that the cleft construction seems to be a rather late trans- 
formation* > pmervlng e.g. the case marking of the single clause: cf. 
Pet var Jag som kom . 'It was I who came 1 and Pet var mlq som du sig. 
1 It was me that you saw 1 . The characteristic of the cleft con- 
struction 1s the presupposed ' iom-clause. Sentetxes 1n which the som* 
clause 1s not presupposed were therefore not analyzed as cleft con- 
structions, e.gl Pet var mamma som ropade frin kttket (02*90) 'It was 
Numey, who shouted from the kitchen 1 . Here 1t 1s not clear from the 
context that somebody was shouting from the kitchen. The sentence Is 



20 



-fi- 



tter* fort analyzed as an existential clause containing a non-defining 
relative clause. However, the borderline between these two con- 
structions is very unclear. 

The clefted elcaent was classified as subject* object, conpleaent, 
or adverbial. An additional subcategory was used to indicate a pseudo- 
cleft construction, o.g . » Vid Jag ; ttnfct* ; U >ar jette 'Hhat I was 
thinking of was this 1 . This was possible because the cleft transforma- 
tion seeos almost never to be applied to a pseudo-cleft sentence* 

(56) ' Left Dislocation. The basic criterion for dislocation Is that 
a functional elenent be aoved outside the clause, so as to font a tone 
untt of It* own. This element Is represented Inside the clause by a 
pronoun copy, which Is often toplcallzed, e.g. PI glrden; dlr hade vl 
M the yard, there we had been playing'. The subcategories 
classified the dislocated eleaenti according to their function In the 
clause. It Is also possible to distinguish between toplcallzed and 
non-topical ized objects* adverblals etc. 

(57) Right Dislocation , The classification Is the s« as for the 
preceding variable. The'dfslccated element 1s placed last In the clause, 
*.g, , Mr hade vlUtt; a olrden . 

(58) Raiting . The subcategories were subject raising, e.g. Jag 
Ug hononrfco— a 9 l tw him com'* :.9*ntric object raising (or Tough- 
iwvement), e,g; , TlVleWtf tvtr att soela v ot ■The violin Is hard to 
p1ay\.^ a MfNfeMriic object raising, e.g.. Flolen Ir svir f8r «1o 
att tpela pa 'The violin is hard for we to play 1 . Object raising 1s 
really applicable to all non-subjects, and different subcategories 
could be used for objects and adverbial s. 

Originally clauses In which raising could have applied but had 
not been used were also recorded. This was later dropped as too 
cumbersome, but would have provided Interesting data. 

(59) Passive . This variable could be used for several purposes. 
One possibility Is to Indicate whether an agent adverbial Is present 
1n the clause or not, and where It Is placed. This Information Is 
coded elsewhere tn the coding-plan, but It is convenient to be able to 
ptck out information from one single variable rather than collecting 
tt from several variables. Another possibility ts to Indicate whether 
the passive is A'bVi-pesslve or an »s-pess1ve. A third possibility Is 
to record sentences constructed with the subject 'man •one 1 undtr a 



21 



-92- 



special subcategory. 

2.5L2 Elliptical transformations 

(60) torb Phrase Pol otlon. This variable Is used to Indicate that 
a verb phrase has been delated under Identic with a verb phrase In the 
context (cf. Andersson 1976). The variable Is not very central to the 
coding key. the subcategories could be used to Indicate whether the 
first or the second verb phrase Is deleted, and whether the deleted 
verb phrase Is not contained In the retaining one (outer deletion), or* 
Is strictly speaking a part of Its <*m correlate (Inner verb phraso 
deletion). He then get the foliating examples: 

Do fir %Um m du vlll /. -forward, outer 

'Too my tula If you want to* 

Do fir vad da vlll y. -forward. Inner 

J Tbu get what you vrnf 

Oo do vtll / fir (to %Um. -uckward, outer 

*tf you want to, you my sarin 1 

Vadduvlll/flrdu. -beckwird. Inner 

Mhat you want, you will get* , 

(61) nilPiU'df Functional Cl—enta (Capping). An ellipsis was 
recorded under this variable only 1f one or several elements were 
deleted under Identity with the corresponding elements In the context, 
fragnattc elllpsds, such as subject deletion 1n Imperative clauses, 
were not Indicated under this variable. In general, the elliptical 
sentence should contain at least two different functional elements of 
the sentence, to as to distinguish the ellipsis froo a normal coordi- 
nation, which has not bean recorded at all here. The only exception to 
this rule 1s coordinated predicate phrases, which have been Indicated 
as two (or several) differ* tt clauses. They are coded for subject 
ellipsis. The other categories ere used for true gapping, and often 
Involve deletion of the verb. The different subcategories Indicate 
what functional element* hate been deleted. 

(62) £11 Iptls of Part 'of : tftt torb. Under this heading were 
gathered so*e less Interesting deletions, which do not seen to have 

a clear textual mottvatlom deletion of ha •have* In dependent clauses 



22 



-93- 



or after a nodal verb, delation of jgra 'do' or fere '90'. Tht 
variable could alto bo used to Indicate that tht predicate con ta Irs 
a coordination of infinite vet* ferns* 

(SJJ hi toils of the Hand of a Worn Ftoose. Tht subcategories of 



tela variable art mod to Indicate whether tht tilt pits took pi act 
tnsldt a coordination of noon phrases or between two other noun 
phrases, and whether It ts the head of the first or the second 
phrese which hts been deleted. 

Dan safllaAxh den dtnev gossan 
'The good and the bod hoy* 

Den tattle gotstn och don Am / 
•The food boy end the bed one' 



coord* * beckwsrd 



coord, t forward 



Din sjefca f tig den frfska pojken 
•The tick one saw tht healthy boy' 

Don sjufca pojken rtg den frfska / 
'The sick boy seat the health one' 

(S4) fceeber of Transfoma tlont in the Clause . Only tht trans- 
formations listed In the coding hey (variables wore counted. 



other* backward 



other, forward 



3. Son* pre! Ivf nary results and hypo these* for Altera research 



Tht quantitative molts yielded tor the present systen can be 
utilized In at least three ways. Firstly* the row frequencies of the 
subcategories of a single variable can be of Interest. Secondly, two 
variables can be cms»tsbolated. which mJk*$ It possible to tsanlnt 
their interdependence. Thirdly, the date can be used to coop* re two 
languages, of two or several texts or genres in the sent language, 
to this section, tone txanples of these types of results will be 
given, flbwever, these results should be re ga r d ed as typo theses to be 
tested fey further research, since the pilot study was Halted to a 
very snail t* Uriel. 

3.1 Frequencies of subcategories 

For variable (10); Clause ] Construction , the pilot study gave 
the following results: 



23 





net co* 
ortftntutf 


flnt pert la 
coertfteettee 


aleeW pert 

f n ceofd* 


Uit pert 
te coord. 


totel 


staple 


tea 


91 




71 


in 


coaple* 


cs 


IS 


t 


ft i 


lot 


taut 


131 


IQf 


44 


144 


4tS 



Out of 4tS clium, 108 or 1**3 * vtre cstploft* contained 
ooo te o&ro l opt od tn t douses* sol *M cUu$*s» tfctt It, »,4 t wnt 
coord tm tod. tf m cooptro t*o dlffortot posit!*** to tfco otordlootfoa. 
Hi find tfctt IS out tf 100 first psrts or 14.* I wrt mplox, *%Ut 
a OMt of 104 lost ports to tt* coordfmtto* or »,i f ottt cotploo. 1 
H*t tt, lost puts to coordtj»tfo*«tf* ponoolljr «ore ctopltz thn 
ffnt ports* TliU Ulllos wtth tfco petoctplt of md m\+t. Capital 
mtOdlo pom fo coot*fotttW vtro rmro (4.5 <); fcowmr, htre 
tfco cooptrt too Is lon o olo t «1s1o*l1o9, tfoeo tfct *t4d1t puts mro 
eppytrtj to first sod Ust pom to gottrtl, not coly to first tad lost 
pom to o coortftatMoi wit* o ttftflt ptrt, Got of t*t sontoocos 
tfttcl. wn not ooortlottod P # or 27.7 t t otro co-pits, t*o. U* 
f ifuro %*s cooptrofclo to tfct ooo for lost parts to coortf«*tio*s, 

ttrtoblot C10K SoMott Hm. tod IIS). QQJtct ^m . clmlty U* 
wbjtct oad tilt ofcjoct occordtop to thtlr mrtlog for dtftoltootss. tf 
oo odd tofttter t*t ffports for tht ▼♦Hows todtftolttotss oottiops to 
ordtr to pot Urptr cottfortot, m ofcttto tht ftlloviitp dUgrooj 





altslno or 
deleted 


Indefinite 


definite 


Irrtlewtt 


totel 


subject 


105 


35 


327 


IS 


485 


object 


296 


38 


99 


52 


465 



U* Mpfe ffptoo for ofcjocts with Irrtltvoot dtfloltootst rtflotts 
tht foct tint objocts srt stottotltl mtch mom often tfeao sofejocts. Out 
of Hat 380 suhjtcts, 3S. or 0.2 S. wtrt todtffolto, tod 327, or 03.4 S. 
wonL4tftottaU 1*0 cormpoodtog ftpwrts for tfct 18) objocts tro 38, 

^ — 

Mors «o f tod t mil cod. 09 (or poocntof) otsuks. Hit tutor of Uw 
first pom lo eoordlotttoos slieildof loorso o*»1 St oStr of tSt 



•fS* 



or 20.1 I , lodtftoltt outs, aod ft, or tt.4 I, dtflnlU omi. TMt t$, 
tthjatt* l*vt • 8Mdi ttrooftr la td ucj twi^i Otftotttom tfcaa 
ofcjocts. 1Mb tall ft* trf Hi Hit tttt lltpalttlc prfoclpla tdwrtfty 9 Ivn 
•iMMtti Mdk oftoa fractfoa 01 topics, art placad lo Cba bofloalftf 
Of t*a claw* tfti Mffc «p 1« II* tjotattk f trvcter*. Stellar flfortt 
cm to HUM froo Hit variants clatt Ityfof tfct flvwtif of U* 
tthjoct ood t*a ofcjoct rtftratt, (It) oN (H). 

J.I Dow »tatalatfoas 

J.I.I Topical Icatfoo tod tobjtct laoft* 1 

A cms~tafcolatfo* m canrlad oat 00 0 tObcarpvt of tfct mIi 
daclartth* clouts. t.o. t 00 mott atritd box* for cm of tht $16* 
cttopHtt U 2, or 3 of varlaOlt (11) and tlwltanaovtly for t*- 
catogavy 1 of wtaMo fll)„ Hit toe varftfeltt crota-tatelatod t*r* 
(17). SOMact Stmctw. Mi (H) § Typa of Topical I tod Coottitawt. 
Noomr. site* t*a topical I tatfots to ttt oataHal wort ralativtiy f*w. 
*a ftparo ft aafccatoportt* (1-7) Mitt eo*l«od. * Ut* otuu to 
ovtrall ft port far topical flatten, tdtfcft cat ba co**rtd wit* tt» 
flpwo tor sootoocts Mora 00 toptcaltmto* applitd (tfkatopary 0). 



atroctwna 
of taftjoct 


tspfca* 
llntlm 


m taplca* 
IImMm 


t»ul 


talttod 


I 


M 


« 


form} Oat 


1 


11 


It 


00a word 


M 


111 


til 


wit* prapotod 
attrtbata 


t 


11 


10 


wtta pottpotod 
•ttrtStt 


I 


12 


II 


cImtm er 

lnfl.lt* 


0 


1 


f 




3 


7 


10 


tout 


111 


SS4 


m 



1 Ulttlaasfarilar . 
rocaatljr by ftrjatta 
•oat Oata prccots lop tytUm. 



25 



*\my% tccaptt* tfct tafttel ptttttM. After tMlc*1tmfM» tt* 
ufcjtct mi p}*c*i la tf* CMtrtl fltW # tMrtiftUly »<t*r Hit fUit* 
wt. Tbit ft, «* « citm»mci #f topt(»1!mi»ft» Vm rthjtct <m 
pile* tftftf* tt* clMft, tftM *tt* 0&*r IWctlMll tltMttt M 
f ttet* tit Ml* ttdfilM to tfctt rtl* Mr* MtttMttil clftuMt* it 
tftfc* Iftt **>Kt ms f!*c*4 to tfct tfcjtct fttlltat (!♦#., *ft»r t*t 
Mia w%) til Mil c*f*f » Cttitanntlil Omsk tfcftaM ttertftar* lm« 
t*M otl«M frw Ult ufclt, Wt tfctfr tfqmctm mm vt Im u*t 
tUNqr taHljr f nflnwct Cht rutin? ct, 

IH cm mil/ Mt tint Hurt It t ttrofif tM <m c y t» f*»Mr 
topfetli tttiM U* ttAJtct cmiUU of cm mt4 *r ft*t Ml/ 
p rt fto l «ttrtt*t*K tf ttt nftjtct ft fw*4, tr cMtttu •# * cImm 
or m t*fi*ttivt, ar nm cum fit* #f • pntm#l Miiftor* u#k*U • 
mtM Uk« plactt mc* Mfi Ml Amu H» r*tt cnU 1* **** t*t cMtrt! 
f l*W **mM pnfmrMg cotttlt cfaort «Im wiU « lit *tm * c*mtrt*rt*1 
tMdatcjr t« t«*t Obt frl*ct»lt *f Mi mi&t mrkt U 

it MCtlMt * 

J.t.f Tiff calf ntlM Mi Hit fern tf tfct t*4>*u 

If MH*!t {!*)♦ ***** f i^curt . |« u* Mm 
crmHattUtto* Is ctiMptd to wtHtfcit fit), **i«t r+m+ <** 
ftUpwing rtttlts *rt tfctafM*: 



r»ra «f the 

Mhjttt 


topic*- 

lltltlM 


m to»lca» 

IttttlM 


t»Ul 


•tntnf 


t 


M 


M 


tatefiiiita 


II 


It 


H 


*rutt* 
- frmr mm 


It 


10 


» 


* 4tfln. art, 


11 


» 


44 


- §mtu »ttr. 


1 


# 


9 


- 4rffa!t» 

««t»tfft«r 


» 


n 


1M 




I 


I 




irrtlrmt 


J 


u 


If 


total 


111 


CM 





2a 



The table shows that topical Izatton Is relatively frequent whon 
the subject Is Indefinite or a proper nafoe, less frequent whon It Is 
a pronoun, and rather Infrequent when It Is a definite noun phrase of 
some other type. This Is In accordance with the rulo that new referents 
tend not to be placed In Initial position. 

3.2,3 Topical Izatlon and clausal structure 

If the subject variable In the preceding cross-tabulations Is 
changed to variable (13), Clause Structure , the following results are 
obtained; 

topical Izatlon no toplca- total 
1 Izatlon 



active transitive 


61 


147 


208 


active Intransitive 


43 


133 


176 


passive 


0 


4 


4 


existential 


2 


10 


12 


other clauses with 
a formal subject 


5 


11 


16 


fragments 


0 


10 


10 


total 


111 


315 


366 



As we can see from the figures, topical Izatlon Is somewhat more frequent 
In transitive sentences (29.3 %) than In Intransitive sentences (24.4 %). 
It 1s possible that this difference Is due to the fact that transitive 
clauses contain more constituents which can be toplcallzed. Away to 
test this hypothesis would be to cross-tabulate variable (52), Type of 
Topi eal 1 ted Cons 1 1 tuen t , and variable (51), Number Of Adverbial s In the 
Clause. However, It should be noted that In active transitive clauses, 
29 quotation objects are toplcallzed. If clauses containing quotations 
are disregarded, .the difference between transitive and Intransitive 
sentences might be smaller or non-existent. 



27 



3*2.4 Qlvenneu of final adverbial In main and subordinate clautet 



Variable (48) , QlVinnUt Of ' Final Adverbial , was crott-tabulated 
against variable (11). ClAute Typo . We have here added the figures for 
all main clause types, and similarly for all dependent clause types, 
tn order to obtain larger and more reliable figures: 





main 
clauses 


dependent 
clautet 


total 


no final adverbial 


216 


49 


265 


just mentioned 


20 


7 


27 


earlier mentioned 


16 


9 


25 


Indexal glvenness 


46 


6 


62 


generally Implied 


7 


1 


8 


new referent 


29 


6 


35 


Irrelevant 


49 


8 


57 


known from the 
speech situation 


10 


2 


12 


total 


393 


88 


481 



We can see from the table that dependent clauses contain a final 
adverbial which has been mentioned before about twice as often as main 
clauses do (10,2 % of the dependent clauses and 4.1 X of the main 
clauses}. This seems to support the hypothesis that dependent clauses 
contain more given Information than main clauses. The tendency to put 
a new referent last In the clause seems to be stronger In main clauses. 
One of the reasons for this might be that dependent clauses often are 
presupposed, referring to previously known facts. 

3,3 Comparison between* Swedish and Finnish 

indicated In the Introduction, one of the basic alms with our 
project was to create a cownon ground for comparing the occurrence of 
textual and syntactic phenomena in different languages. In fact, we 
even used the same texts as material for the Finnish and Swedish pilot 



28 



-99- 



studles, (Both texts for the Finnish study were translations of tho 
texts used for the Swedish coding} cf, Hakullnen A Karlsson, 
forthcoming.) 

The task of finding a level at which the different languages 
would be fully comparable Is, of courte> a difficult one. Each 
language 1t by Itself a phenomenon built up of Interacting systems. 
And 1t wtll not do to try to Isolate a particular factor from one 
system and compare tt with a similar factor In another language 
without taking Into account the fact that this factor In turn also belongs 
to a systtm* and though we can compare systems In different 
languages! they rarely overlap In all their details, 

Mtth thts tn mtnd we shall here give Just one example of what 
ktnds of information slmtlar coding keys for different languages can 
give, (S21 Ty0e of Totalized COrtatlUient , Different types of 
functtonal elements seem to be topical Ized with different frequencies 
tn Swedtsh and Finnish: 



toptcallzed 
constituent 


Swedish 


Finnish 


no toptcallzatlon 


356 


410 


quotattons 


31 




other objects 


16 


84 


adverbial 




92 


- from end-field 


bl 




- from central field 


21 




complement 


6 




existential subject 


1 




verb phrase 


1 




total 


485 


602 



As can be seen from the figures, the object seems to be toplca- 
Itzed much more frequently In Finnish than In Swedish, (This Is true 
even tf we add the frequencies of toptcallzed quotations to the 
Swedtsh figure,} One of the most obvious reasons for this Is that 
Finnishes opposed to Swedish— ts a highly synthetic language, and 
can make use of Inflectional devices to Indicate the granaatlcal 
functions of NPs tn a clause. Another difference that could be noted 
was that temporal adverbial s were toptcallzed remarkably often In Swedtsh, 
In Ftnniih, place adverbial s were toptcallzed with the same frequency as 
temporal ones, - JC- 4 



-100- 



APPENDIX 1. 

Revised version of the TIRO coding key for date processing of Swedish 

In this section wo wish to give a summary of the variables 
discussed In §2. and their use. The numbers on the lift refer to 
the columns on the punching card, and are Indicated here for ease 
of reference. Heedless to sty, the numbering used below 1s arbitrary, 
and can be varied. The different values or subcategories of tha 
variables are also numbered (1 - 9, followed by A - l\ thus A equals 
subclass number 10, etc.) For variables where the subclasses overlap, 
we have tried to tndtcote what we feel to be the "hierarchical order' 
of the subclasses by giving the 'strongest 1 subclass number 1# etc. 

COLUMN VARIABLE VALUES 

Identification of Clauses . 

1 Language of Text 1 Finnish 

2 Swedish 

2 Identification Number of Text 
3*5 Number of Sentence In Text 

6 Ninber of Clause In Sentence 

Clausal Variables . 

7 Number of Clauses In Entire Sentence . 

8 Matrix Address of Dependent Clause 

9 Status of Clause In Sentence 

1 Single main clause 

2 First main clause of coordination 

3 Hat* clause medially In coordlna- 

- tioo 

4 Last -main clause of coordination 

5 Dependent clause before Its 
superordlnat* 

6 Dependent clause Inside Its super- 
ordinate 

7 Dependent clause after Its super* 
ordinate (but not sentence-final) 

8 Dependent clause standing last In 
a sentence 

i • 

30 



-101- 



Clautal Construction 

1 Staple clause, not coordinated 

2 Staple. Inttlal part tn coordination 

3 Staple aodlal part In coordination 

4 Staple, final part tn coordination 

5 Cowplex clause* not coordinated 

0 Complex, Initial part In coordination 
7 cowpltx, Medial part In coordination 

6 Complex, final part tn coordination 
Clauit Type 1 Hatn claust IntrodMcad by coordinator 

2 Embedded Mtn claust 

3 Main clauie, other than 1 4 2 

4 atUclause not Introduced by pre- 
?5fttt0n 

5 tndtrtct tnterrogattve clauit 
6*att»clauso tntroductd by prtposltlon 

7 Subjectnodtfytng relative claust 

8 Objectnodlfying rtlatlvt claust 

9 fttUttvt claust wdlfylno othtr 
eleaants tn tti tuperordinate claust 

A AdvtrMal claust 

B Comparative claust 

C Clausal contractions 

Surf act Hood of Cltust 

1 Afftnutlvt 

2 Interrogative 

3 Imperative 

4 Exclamatory 

Clause Structure 

1 Active transitive 

2 Active tntranstttve 

3 Passive wtth full subject 

4 Extstenttal 

5 Passtve with fonaal subject only 

' 6 Other clauses wtth fonaal subject 
7 fragments 
Surface Word Order of Clause 

1 SYX 



31 



•102- 



2 SXV 
9 VSX 

4 VXS 

5 XSV 

6 XVJ 

7 Si 
5 VS 
9* 



1 Object (♦ Adv) n 

2 Complement (♦ Adv) n 

3 {Advl n 



1 Ctoplex fora of phrasal verb tn 
vtrb chain + tig 

2 Complex fora of phrasal vtrb tn 
chain 



chain 

4 Cooplex vtrb fora tn vtrb chatn 

5 Phrasal vtrb tn vtrb chatn ♦ al£ 

6 phrasal vtrb tn vtrb chatn 

7 V + tn vtrb ch^tn 

8 S1«9l* vtrb fora tn vtrb chatn 

9 Complex fora of phrasal vtrb 4 tig 
A Co«pltx fora of phrasil vtrb 

B Coopltx fora of V ♦ itfl 
C Coopltx vtrb fora 
D Phrasal vtrb ♦ i1g 
E Phrasal vtrb 

6 S1npl« vtrb fora 



15 



Vtrb Coapl wants 



Constituent VerUMea , 
16 Vtrb Structure 




17 



Subject Structure 



2 
3 
4 



Coordination 
HP ♦ clause 
Clause 



Infinitive 



32 



10 Subject Pom 



-103- 



5 NP ♦ poitnodtfler 

6 Prepoied participle ♦ HP 

7 Othar praaodlfter ♦ NP 

0 Ont word (ind possibly an article) 
9 form} subject 

A Subject gippcd 

1 Indefinite article 

2 No article 

3 Indefinite quantifier 

4 Definite article 

5 Proper name 

6 Oenttlvo attribute 

7 Definite quantifier 
6 Definite pronoun 

9 Irrelevant 



19 Otvenness of Subject 

1 Just Mentioned 

2 Recently wtnttone* 

3 Previously mentioned 

4 Index teal ly olvtn 

5 PregMttcally given 

6 Nat 

7 Irrelevant 

20 Type of Conference of Subjtct 

1 Anaphoric pronoun 

2 Cataphoric pronoun 

3 Exophortc pronoun 

4 Repetition 

5 Synonym paraphrase 

6 Irrelevant 



21 


Ntabtr of Hards In Subjtct Phrtt* 




22 


Objtet Structure 


(ef. 17) 


23 


Objtet For* 


(cf. 18) 


24 


fitvtnness of Objtet 


(ef, 19) 


25 


Typt of Cortftrtnct of Objtet 


(ef. 20) 


26 


Nutter of Mordt In Objtet Phratt 





33 



ERIC 



•104 



Cooplownt Structure 

1 Coordination 
I HP ♦ clause 

3 CI wit 

4 Inftntttve 

9 Hp ♦ poitaodlMar 

6 proposed participle ♦ HP 

7 Othtr pnwodtf ter ♦ HP 

8 Ono uortf (and possibly an article) 

9 Adjective ♦ cliuit 

A Adjective ♦ adjKttvt coordination 
8 AdJ *i»dtfter{i) 
C Of* vord adjective 

Complement Typo 

1 Obligatory subject complement 
I Optional subject co«p1e»ent 

3 Obligatory object complement 

4 Optional object complement 
Number of Word* tn Complement 

Structure of Initial Adverbial 

1 Kp/prepp 4 clause 

2 (Prep*} Clause 

3 Clausal contraction 

4 (Prepf) Infinitive 

5 Prepotltttnal phrase 

6 HP or Adv Phrase 

7 One word 

Type of Initial Adverbial 

1 Obligatory adverbial 

2 Frame adverbial 

3 Cwwintttory adverbial 

4 Henttve particle 

5 Spectil clausal adverbial 

6 Other optional adverbial 
Semantics of Inttltl Adverbial 

1 T1«e 
Z Place 



34 



•1W- 



3 Htwwr 

4 Dtgrtt i fNitvr* 

i Pvrpotai i conditioni c*ute 
0 FmtruwnUl 
7 Agtnt 
0 Irrtlmnt 



33 


Clvenneti of InttUI Advorblol 


(cf. 


19) 


34 


Type of Coreferance of fnttlol Advorblol 


(cf. 


10) 


3S 


Structure of Central rttld Adverbtol 


(cf. 


30) 


36 


Type of Control F Itld Advertlol 


(cf. 


3D 


37 


Smnttci of Control Fiold Adverblel 


(cf. 


3«) 


30 


Olvonnou of Control f fold Adverbtol 


(«f. 


19) 


39 


typo of Coroformo of Control rttld Adv 


(cf. 


10) 


40 


Structure of lott but Ono Advorblol 


(cf. 


30) 


41 


Typo of Loot but Ono Advorblol 


(cf. 


3D 


42 


Somnttco of Utt but Ono Ad»erbtel 


(cf. 


32) 


43 


Otvenneio of Utt but Ono Advorblol 


(cf. 


19) 


44 


Typo of Conference of Utt but Ono Adv 


(cf. 


M) 


45 


Structure of Ftnel Advorblol 


(cf. 


30) 


4fi 


Typo of Ftnel Advorblol 


(cf. 


3D 


47 


SoMnttc* of Ftnel Advorblol 


(cf. 


32) 


48 


Otvonnttt of Ftnol Advorblol 


(cf. 


») 


49 


Typo of Coroforonco of Ftnel Advorblol 


(cf. 


20) 


SO 


Spoctel Clouiol Advorblol tntldo o Functional Eleownt 



1 In tubjtct 

2 In ofcjtct 

3 In coaplwtnt 

4 In oOVtr tdvtrtttl 
51 Mu*tr of Mwttals In tht CUuit 



Tnnifon^ttont . 

52 Typa of Toptcaltxtd Conitttutnt 

1 Objtct 

2 Objtct at 

3 MverMal 
pottttofi 

4 Advtrblal 



quoutton 

from ctntrtl fttld 

from trvj-fltld position 



35- 



»1M* 



8) 



55 



6 Ivbjtct 1n tultttntfal dau«t 

7<VP 

6 MH qwit1on*«rd 
t telitfvt pronoun 

Jvbjact fottpoM^t to xh* Ind«f1t1d 

1 W tubjKt in axtittntlal 
conttructton 

I KP Subjact olh#r U»n 1 

J Clautt at tvbjact 

4 Inftntttvt as tubjtct 

Ixtrtpotttion and Qutntlflar ftmmnt 

\ fcttrapoitUon of part of tubjtct 

* E*tr*p«fMon of part of *M«t 

3 ExtrtpotHlon of part of co«pla«ont 

4 Cxtrtpotltton of part of adwblal 

5 feirapotHtOft of functional 
tlaaant from clauit 

t*o*m*t\t of quafttffttr 

7 Wwantnt of Nad. quanMflar nm\n% 
Typo of Claft Construction 

1 Subjtct cltf Ud 
t Objoct clofUd 

3 Copploaant cltfud 

4 Advtrttal cltfttd 

6 Pt«u4o*c1tft 



lift Dislocation 



57 

sa 



fttght Dttlocatton 
(Utttng 



1 Subjact 
t Objtct 

3 Coaplaatnt 

4 initial advtrttal 

5 Ottar advarblal 

(cf. S6) 

1 Subjtct 

Z Objact — ftnarlc 

3 Objact — nonftnartc 

4 Mmbtal 



36 



ERIC 



Fmltt 



1 Iftii <vt wlUttvt ijm 



.jiiltti tpftt it IMtftl 
fmftftl 

fktrti ipmt #i ct*t/*l 
' t*vt rfctit 



Vtrb Pfcwt Otltilon 



•fttr 

• 

• •Hfrfctil 11 ***** §l ' §lt M 

1 »£'|»istvt» ipM 4t ftmt ttfmfcltl 

• Mi Niiiw uitJtovt i»m 

7 SlU J»"1v" it fMUit 
W?trt>til 

• Mi fMiitlvti iftnt «s ctntrtl f UI4 
UrtrfcUl 

A Ml pmtvtt ip*t it fifal mmtiil 

I «MJ ptlltVt 



t rorwrtJ, limr 

' 4 ftKkwind, iwitf 
nitpiii of function*? ct»u(c*ppi^) 

1 Subjtct 

tOfcjtCt 

3 MvtrbUI 

4 Vtrb 

5 S ♦ 0 

6 S ♦ V 

7 V ♦ 0 
6 V ♦ A 

nitptti of fart of Hit Vtrb 

1 Auxtltiry in co*plt* form 

I ggg 

3fm 

mtpiti of U» H+td of i tot* flirt it 

1 first r*rt of coordlnit#4 Ifr* 
t Stcond put of coortf<Mt*d »t 



37 



9 rim •* *** *t 
c**f4t*tM*Q 



M IM**f of Tr**ff9n*iMft*t In U* CUvft 



Mntfucci 

U* HmtrcJi InitUuU of 0* Ate AUtart f*n*#tt*» Nr. II, 
Abo. 

DiJtrtchiwi, FiwK (IH4) Ct«««t4te gf*w»| 1 k > 

M»iwMn^ % AuM I fff4 KftfllftOft, (fortHCftfttftg) MiMitl ftt »/|gQ»* H»Mr»»« 

«H»Ha . US, HtlfttnM. 
tote***, VMJo I Juiti SalMla. (Uifi v*\\m) *Atf»ttt*ti till*** J» ivtoMit* 
ttitn tt#ttJ*Attlttt*1yn o*9t'*<* Mt1U1tt#«11UtflI tuttlmfctmi* 
Utltt, HftrJitU, (11TB) fO.^ftg 3m LltJ* fvr4**t*\erir*)tA \ J^jUjMgJ^trg 

romn *0U »ttm4 p4 J»r4»ft* , fro grtdv tfcttti, Ate AM*c*1, 

HMIMAI 

Ctrptlin, fto. (1H4) Mwo , ftowiitr, St*ei*ot«. 

Aitrld. (1973) gr^nu Ujo^hjlfU , MM* ft Sjftfrtfi, StOCUwU. 



38 



